Comparing Distributional and Mirror Translation Similarities for Extracting Synonyms
نویسندگان
چکیده
Automated thesaurus construction by collecting relations between lexical items (synonyms, antonyms, etc) has a long tradition in natural language processing. This has been done by exploiting dictionary structures or distributional context regularities (coocurrence, syntactic associations, or translation equivalents), in order to define measures of lexical similarity or relatedness. Dyvik had proposed to use aligned multilingual corpora and defines similar terms as terms that often share their translations. We evaluate the usefulness of this similarity for the extraction of synonyms, compared to the more widespread distributional approach.
منابع مشابه
Synonym extraction and abbreviation expansion with ensembles of semantic spaces
BACKGROUND Terminologies that account for variation in language use by linking synonyms and abbreviations to their corresponding concept are important enablers of high-quality information extraction from medical texts. Due to the use of specialized sub-languages in the medical domain, manual construction of semantic resources that accurately reflect language use is both costly and challenging, ...
متن کاملIntegrating Distributional Lexical Contrast into Word Embeddings for Antonym-Synonym Distinction
We propose a novel vector representation that integrates lexical contrast into distributional vectors and strengthens the most salient features for determining degrees of word similarity. The improved vectors significantly outperform standard models and distinguish antonyms from synonyms with an average precision of 0.66–0.76 across word classes (adjectives, nouns, verbs). Moreover, we integrat...
متن کاملLeveraging Paraphrase Labels to Extract Synonyms from Twitter
We present an approach for automatically learning synonyms from a corpus of paraphrased tweets. The synonyms are learned by using shallow parse chunks to create candidate synonyms and their context windows, and the synonyms are substituted back into a paraphrase detection system that uses machine translation metrics as features for a classifier. We find a 2.29% improvement in F1 when we train a...
متن کاملDistributional Similarity of Multi-Word Expressions
Most existing systems for automatically extracting lexical-semantic resources neglect multi-word expressions (MWEs), even though approximately 30% of gold-standard thesauri entries are MWEs. We present a distributional similarity system that identifies synonyms for MWEs. We extend Grefenstette’s SEXTANT shallow parser to first identify bigram MWEs using collocation statistics from the Google WE...
متن کاملExtracting Synonyms from Dictionary Definitions
Automatic extraction of synonyms and/or semantically related words has various applications in Natural Language Processing (NLP). There are currently two mainstream extraction paradigms, namely, lexicon-based and distributional approaches. The former usually suffers from low coverage, while the latter is only able to capture general relatedness rather than strict synonymy. In this paper, two ru...
متن کامل